A. The Agentic Shift

From “user-in-the-loop” to “user-on-the-loop”

Agenda

  • A. The Agentic Shift — Why agents matter (and when they don’t)
  • B. The Continuum — Four levels of AI system autonomy
  • C. The Decision Framework — When NOT to use an agent
  • D. ReAct Theory — Thought -> Action -> Observation
  • E. Wrap-up — Key takeaways & lab preview

What Is an “Agent”?

An agent is an LLM that autonomously decides its next step in a loop, with access to tools and memory.

The defining difference from tool calling:

Pattern Who decides the next step?
Tool Calling You (predefined workflow)
Agent The LLM (at runtime)

The Core Insight

An agent is not magic. It is a while loop where the LLM decides what to do next — planning, acting, and adapting based on observations.

The Hype vs Reality

The Promise

  • Autonomous task completion
  • Multi-step reasoning
  • Tool orchestration
  • Self-correction

The Reality

  • Unpredictable costs ($0.02 or $2.00?)
  • Infinite loops
  • Hallucinated tool calls
  • Debugging nightmares

Your first job as an AI engineer: Be a discerning architect. Not every problem needs an agent.

B. The Continuum

Four levels from static to fully autonomous

The AI System Spectrum

graph LR
    A["Static LLM<br/>Call"]
    A --> B["Tool<br/>Calling"]
    B --> C["Single Agent<br/>+ Planning"]
    C --> D["Multi-Agent<br/>System"]

    style A fill:#00C9A7,stroke:#1C355E,color:#1C355E
    style B fill:#9B8EC0,stroke:#1C355E,color:#1C355E
    style C fill:#FF7A5C,stroke:#1C355E,color:#1C355E
    style D fill:#FF7A5C,stroke:#1C355E,color:#1C355E

Each step to the right adds: more autonomy, more cost, more complexity, less predictability.

Level 1: Static LLM Call

A single prompt -> a single response. No memory, no tools, no loops.

response = completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Classify sentiment: 'Great product!'"}]
)
# -> "Positive"
  • Latency: ~1 second
  • Cost: ~$0.01
  • Use when: Tasks are deterministic and context-independent

Level 2: Tool Calling

The LLM decides which function to call and with what arguments.

sequenceDiagram
    participant U as User
    participant L as LLM
    participant T as Tool

    U->>L: "What's the weather in Riyadh?"
    L->>T: get_weather(location="Riyadh")
    T-->>L: {"temp": 38, "condition": "sunny"}
    L->>U: "It's 38C and sunny in Riyadh."

Not Yet an Agent

Tool calling is a controlled extension — the control flow is still linear. One request, one tool call, one answer.

Level 3: Single Agent with Planning

The agent plans a sequence of steps, executes them with tools, and adapts based on results.

# The agent autonomously decides:
# Step 1: Search "transformer architectures 2023"
# Step 2: Read top 3 results
# Step 3: Extract key contributions
# Step 4: Synthesize into report
# (Each step is an LLM call + tool execution)
  • Latency: 5-30 seconds (multiple LLM calls)
  • Cost: $0.10-$2.00 per query
  • Must implement timeouts and max-step limits

Level 4: Multi-Agent System

Specialized agents collaborate — Researcher, Analyst, Writer — coordinated by an Orchestrator.

graph TB
    O[Orchestrator] --> R[Researcher]
    O --> A[Analyst]
    O --> W[Writer]
    R -->|findings| O
    A -->|analysis| O
    W -->|draft| O
    A -.->|review| W

    style O fill:#1C355E,stroke:#00C9A7,color:white
    style R fill:#00C9A7,stroke:#1C355E,color:#1C355E
    style A fill:#9B8EC0,stroke:#1C355E,color:#1C355E
    style W fill:#FF7A5C,stroke:#1C355E,color:#1C355E

  • Most complex to debug and monitor
  • Justify only for multi-faceted tasks

The Spectrum at a Glance

Level Architecture Latency Cost When to Use
1 Static LLM ~1s $0.01 Classification, extraction
2 Tool Calling ~3s $0.05 Single action + response
3 Single Agent 5-30s $0.10-2 Multi-step research
4 Multi-Agent 10-60s $0.50-5 Complex, multi-faceted tasks

Career Insight: Demonstrating you understand this continuum and can justify architectural choices is more impressive than saying “I built an agent.”

C. The Decision Framework

When NOT to use an agent

The Three Tradeoffs

Every step up the continuum increases three costs:

Reliability

Each LLM call has a chance of failure. More calls = more failure points.

Single call: 95% reliable 5-step agent: ~77% reliable

Cost

Agents multiply token usage. A 10-step agent uses 10x the tokens of a single call.

Budget per query matters.

Latency

Each planning step adds 1-3 seconds. Users notice after 5 seconds.

Multi-agent can hit 30-60s.

The Decision Tree

graph LR
    Q1{"Does the task need<br/>multiple steps?"}
    Q2{"Does it need<br/>external data or actions?"}
    Q3{"Can steps be<br/>pre-determined?"}
    Q4{"Do you need independent<br/>quality checks?"}


    L2["Tool Calling"]
    L3["Single Agent"]
    L4["Multi-Agent"]


    Q1 -->|Yes| Q2
    Q2 -->|Yes| Q3
    Q3 -->|Yes| L2
    Q3 -->|"No — needs planning"| Q4
    Q4 -->|No| L3
    Q4 -->|Yes| L4

    style L2 fill:#00C9A7,stroke:#1C355E,color:#1C355E

    style L3 fill:#FF7A5C,stroke:#1C355E,color:#1C355E
    style L4 fill:#FF7A5C,stroke:#1C355E,color:#1C355E

Real-World Examples

Task Right Architecture Why
“Summarize this email” Static LLM One pass, no tools needed
“Book a meeting at 2pm” Tool Calling One tool, predictable flow
“Research X and write a report” Single Agent Multi-step, needs planning
“Compare EU vs US policy, with quality review” Multi-Agent Independent research + review gate

The Golden Rule

Start with the simplest architecture that works. Only upgrade when you hit a specific bottleneck: context overflow, quality degradation, or throughput limits.

D. ReAct Theory

Thought -> Action -> Observation

What Is ReAct?

ReAct (Reasoning + Acting) was introduced by Yao et al. (2022). It is the most widely adopted agent loop pattern.

The core idea: instead of calling tools blindly, the agent explicitly reasons about what it observes and what to do next at every step.

The ReAct Loop

graph LR
    T["THOUGHT<br/>Agent reasons about<br/>current state"] --> A["ACTION<br/>Agent calls a tool<br/>with arguments"]
    A --> O["OBSERVATION<br/>System returns<br/>tool result"]
    O --> T
    O --> F["FINAL ANSWER<br/>Agent has enough<br/>info to respond"]

    style T fill:#1C355E,stroke:#00C9A7,color:white
    style A fill:#00C9A7,stroke:#1C355E,color:#1C355E
    style O fill:#9B8EC0,stroke:#1C355E,color:#1C355E
    style F fill:#FF7A5C,stroke:#1C355E,color:#1C355E

Each iteration is one “step” in the agent’s execution.

ReAct in Action — Example

Query: “What is the population of the capital of France?”

Step Phase Content
1 Thought I need to find the capital of France first, then look up its population.
1 Action search("capital of France")
1 Observation “The capital of France is Paris.”
2 Thought Now I know it’s Paris. I need to find the population of Paris.
2 Action search("population of Paris")
2 Observation “The population of Paris is approximately 2.1 million.”
3 Final Answer “The population of the capital of France (Paris) is approximately 2.1 million.”

Why Not Just Call Tools Directly?

Without explicit reasoning, agents fail in predictable ways:

Without ReAct

  • Skips steps, jumps to conclusions
  • Calls irrelevant tools
  • Cannot explain its decisions
  • Debugging is impossible

With ReAct

  • Plans before acting
  • Selects tools based on reasoning
  • Full audit trail of “why”
  • Every step is traceable

The Thought step is not overhead — it is the entire point. It makes agents debuggable.

The Agent Core Components

Every agent has three building blocks:

graph TB
    subgraph Agent["Agent System"]
        M["MEMORY<br/>Conversation history<br/>+ past actions"]
        P["PLANNING<br/>ReAct reasoning<br/>+ step decomposition"]
        A["ACTION<br/>Tool execution<br/>+ result handling"]
    end

    M --> P
    P --> A
    A --> M

    style M fill:#00C9A7,stroke:#1C355E,color:#1C355E
    style P fill:#9B8EC0,stroke:#1C355E,color:#1C355E
    style A fill:#FF7A5C,stroke:#1C355E,color:#1C355E

  • Memory: Persists context across turns (conversation buffer, summaries, vector memory)
  • Planning: Reasons about what to do next (ReAct, Plan-and-Execute, CoT)
  • Action: Executes tools and returns results safely

ReAct vs Plan-and-Execute

Two common agent patterns — know the difference:

ReAct (Reactive)

graph LR
    T[Thought] --> A[Action]
    A --> O[Observation]
    O --> T
    O --> F[Final Answer]

  • Plans one step at a time
  • Adapts based on observations
  • More flexible, less predictable

Plan-and-Execute

graph LR
    P[Plan All Steps] --> E1[Step 1]
    E1 --> E2[Step 2]
    E2 --> E3[...]
    E3 --> F[Final Answer]

  • Plans everything upfront
  • Executes sequentially
  • More predictable, less adaptive

The Agent Loop — Demystified

At its core, every agent is this pattern:

def run(query: str, max_steps: int = 10) -> str:
    messages = [
        {"role": "system", "content": system_prompt}, 
        {"role": "user", "content": query}
    ]

    for step in range(max_steps):
        response = llm(messages)             # 1. Ask the LLM
        messages.append(response.message)    # 2. Add assistant msg to history

        if response.tool_calls:              # 3. If model wants to act...
            for call in response.tool_calls:
                result = execute_tool(call)
                messages.append({            # 4. Add tool result(s)
                    "role": "tool", 
                    "tool_call_id": call.id, 
                    "content": str(result)
                })
        else:                                # 5. If no more tools, final answer
            return response.content

    return "Max steps reached"               # Safety limit

That’s it. Everything else — memory, tracing, parallelism — is built on top of this loop.

E. Wrap-up

Key Takeaways

  1. Agents are while loops with state and reasoning — not magic
  2. The Continuum tells you which architecture fits your problem
  3. Start simple — upgrade only when you hit a specific bottleneck
  4. ReAct (Thought -> Action -> Observation) makes agents debuggable
  5. Plan-and-Execute plans upfront; ReAct adapts step-by-step

Lab Preview: Building the Brain

Part 1: Build the Loop

  • Implement the agent while loop
  • Connect tools to the LLM
  • Handle tool calls and responses

Part 2: Add ReAct

  • Add Thought/Action/Observation reasoning
  • Implement step limits and timeouts
  • Trace and debug agent decisions

Time: 75 minutes

Questions?

Session 1 Complete